Activity 2 - UNITED KINGDOM ROAD ACCIDENT RECORDS REPORT


Data Analyst : Jomel Tomeo

Initialization of Core Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import folium
from folium.plugins import HeatMap
from scipy.stats import f_oneway
import warnings
warnings.filterwarnings('ignore')

Loading Dataset(s) into DataFrames

Requirements:

  • DataFrame Identifier
  • Dataset File Location
In [2]:
accident = pd.read_csv("datasets\\uk_road_accident.csv")
accident
Out[2]:
Index Accident_Severity Accident Date Latitude Light_Conditions District Area Longitude Number_of_Casualties Number_of_Vehicles Road_Surface_Conditions Road_Type Urban_or_Rural_Area Weather_Conditions Vehicle_Type
0 200701BS64157 Serious 5/6/2019 51.506187 Darkness - lights lit Kensington and Chelsea -0.209082 1 2 Dry Single carriageway Urban Fine no high winds Car
1 200701BS65737 Serious 2/7/2019 51.495029 Daylight Kensington and Chelsea -0.173647 1 2 Wet or damp Single carriageway Urban Raining no high winds Car
2 200701BS66127 Serious 26-08-2019 51.517715 Darkness - lighting unknown Kensington and Chelsea -0.210215 1 3 Dry NaN Urban NaN Taxi/Private hire car
3 200701BS66128 Serious 16-08-2019 51.495478 Daylight Kensington and Chelsea -0.202731 1 4 Dry Single carriageway Urban Fine no high winds Bus or coach (17 or more pass seats)
4 200701BS66837 Slight 3/9/2019 51.488576 Darkness - lights lit Kensington and Chelsea -0.192487 1 2 Dry NaN Urban NaN Other vehicle
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
660678 201091NJ14695 Fatal 21-10-2022 58.445475 Darkness - lights lit Highland -3.065535 1 1 Wet or damp Single carriageway Rural Fine no high winds Car
660679 201091NJ14695 Fatal 21-10-2022 58.445475 Darkness - lights lit Highland -3.065535 1 1 Wet or damp Single carriageway Rural Fine no high winds Car
660680 201091NJ14695 Fatal 21-10-2022 58.445475 Darkness - lights lit Highland -3.065535 1 1 Wet or damp Single carriageway Rural Fine no high winds Car
660681 201091NJ14695 Fatal 21-10-2022 58.445475 Darkness - lights lit Highland -3.065535 1 1 Wet or damp Single carriageway Rural Fine no high winds Car
660682 201091NJ14695 Fatal 21-10-2022 58.445475 Darkness - lights lit Highland -3.065535 1 1 Wet or damp Single carriageway Rural Fine no high winds Car

660683 rows × 14 columns

Descriptive Analytics


In [3]:
accident.describe()
Out[3]:
Latitude Longitude Number_of_Casualties Number_of_Vehicles
count 660658.000000 660657.000000 660683.000000 660683.000000
mean 52.553845 -1.431256 1.356970 1.831181
std 1.406684 1.383305 0.824734 0.715285
min 49.914430 -7.516225 1.000000 1.000000
25% 51.490690 -2.332472 1.000000 1.000000
50% 52.315646 -1.411916 1.000000 2.000000
75% 53.453473 -0.232870 1.000000 2.000000
max 60.757544 1.762010 68.000000 32.000000
In [4]:
accident.describe().T
Out[4]:
count mean std min 25% 50% 75% max
Latitude 660658.0 52.553845 1.406684 49.914430 51.490690 52.315646 53.453473 60.757544
Longitude 660657.0 -1.431256 1.383305 -7.516225 -2.332472 -1.411916 -0.232870 1.762010
Number_of_Casualties 660683.0 1.356970 0.824734 1.000000 1.000000 1.000000 1.000000 68.000000
Number_of_Vehicles 660683.0 1.831181 0.715285 1.000000 1.000000 2.000000 2.000000 32.000000

Checking Null Values


In [5]:
accident.isnull().sum()
Out[5]:
Index                          0
Accident_Severity              0
Accident Date                  0
Latitude                      25
Light_Conditions               0
District Area                  0
Longitude                     26
Number_of_Casualties           0
Number_of_Vehicles             0
Road_Surface_Conditions      726
Road_Type                   4519
Urban_or_Rural_Area           15
Weather_Conditions         14127
Vehicle_Type                   0
dtype: int64

Filling Up Null Values


In [6]:
accident['Latitude'] = accident['Latitude'].fillna(accident['Latitude'].mean())
accident['Longitude'] = accident['Longitude'].fillna(accident['Longitude'].mean())
accident['Road_Surface_Conditions'] = accident['Road_Surface_Conditions'].fillna("unaccounted")
accident['Road_Type'] = accident['Road_Type'].fillna("unaccounted")
accident['Urban_or_Rural_Area'] = accident['Urban_or_Rural_Area'].fillna(accident['Urban_or_Rural_Area'].mode()[0])
accident['Weather_Conditions'] = accident['Weather_Conditions'].fillna("unaccounted")
In [7]:
accident.isnull().sum()
Out[7]:
Index                      0
Accident_Severity          0
Accident Date              0
Latitude                   0
Light_Conditions           0
District Area              0
Longitude                  0
Number_of_Casualties       0
Number_of_Vehicles         0
Road_Surface_Conditions    0
Road_Type                  0
Urban_or_Rural_Area        0
Weather_Conditions         0
Vehicle_Type               0
dtype: int64

Categorical Data


In [8]:
accident['Accident_Severity'] = accident['Accident_Severity'].astype('category')
accident['Light_Conditions'] = accident['Light_Conditions'].astype('category')
accident['Road_Surface_Conditions'] = accident['Road_Surface_Conditions'].astype('category')
accident['Road_Type'] = accident['Road_Type'].astype('category')
accident['Urban_or_Rural_Area'] = accident['Urban_or_Rural_Area'].astype('category')
accident['Weather_Conditions'] = accident['Weather_Conditions'].astype('category')
accident['Vehicle_Type'] = accident['Vehicle_Type'].astype('category')
In [9]:
accident.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 660683 entries, 0 to 660682
Data columns (total 14 columns):
 #   Column                   Non-Null Count   Dtype   
---  ------                   --------------   -----   
 0   Index                    660683 non-null  object  
 1   Accident_Severity        660683 non-null  category
 2   Accident Date            660683 non-null  object  
 3   Latitude                 660683 non-null  float64 
 4   Light_Conditions         660683 non-null  category
 5   District Area            660683 non-null  object  
 6   Longitude                660683 non-null  float64 
 7   Number_of_Casualties     660683 non-null  int64   
 8   Number_of_Vehicles       660683 non-null  int64   
 9   Road_Surface_Conditions  660683 non-null  category
 10  Road_Type                660683 non-null  category
 11  Urban_or_Rural_Area      660683 non-null  category
 12  Weather_Conditions       660683 non-null  category
 13  Vehicle_Type             660683 non-null  category
dtypes: category(7), float64(2), int64(2), object(3)
memory usage: 39.7+ MB

Clearning any Inconsistencies with the Dataset


In [10]:
accident['Accident Date'] = accident['Accident Date'].str.strip()
accident['Accident Date'] = accident['Accident Date'].astype('str')
accident['Accident Date'] = accident['Accident Date'].str.replace('/', '-')
In [11]:
accident['Accident Date'] = pd.to_datetime(accident['Accident Date'], dayfirst = True, errors='coerce')

Extracting Date Information Using Pandas DateTime


In [12]:
accident['Year'] = accident['Accident Date'].dt.year
accident['Month'] = accident['Accident Date'].dt.month
accident['Day'] = accident['Accident Date'].dt.day
accident['DayOfWeek'] = accident['Accident Date'].dt.dayofweek # Monday = 0, Sunday = 6 
In [13]:
accident.isnull().sum()
Out[13]:
Index                      0
Accident_Severity          0
Accident Date              0
Latitude                   0
Light_Conditions           0
District Area              0
Longitude                  0
Number_of_Casualties       0
Number_of_Vehicles         0
Road_Surface_Conditions    0
Road_Type                  0
Urban_or_Rural_Area        0
Weather_Conditions         0
Vehicle_Type               0
Year                       0
Month                      0
Day                        0
DayOfWeek                  0
dtype: int64

DATA EXPLORATION


Conducting data exploration by structuring and analyzing twenty-five (25) key questions


Below are the twenty-five (25) questions that establish the structure of the analytical process

  1. To optimize traffic police patrols, on which day of the week do most accidents occur?

  2. Has there been a year-over-year trend in accident rates? Are our safety measures working?

  3. What percentage of accidents causes at least one serious or fatal injury?

  4. On average, how many people get hurt in each accident, and does this number change when more vehicles are involved?

  5. Is there a specific combination of light and weather conditions that is particularly dangerous?

  6. In each region, do more accidents happen in urban areas or rural areas?

  7. During rainy weather, which type of vehicle is most often involved in accidents?

  8. For our annual review, for each year, what was the district with the highest number of accidents? Has the worst-performing district changed?

  9. When are motorcycle accidents most likely to happen during the week?

  10. Which month of the year has the highest frequency of accidents?

  11. Are there geographic clusters of severe accidents, and is accident severity related to latitude and longitude?

  12. Do accidents on weekends correlate with higher severity than accidents on weekdays?

  13. Which road type has the strongest correlation with higher casualty counts?

  14. Do accidents involving cars, motorcycles, and HGVs differ in the number of other vehicles involved?

  15. Do multiple vehicles accidents result in more casualties compared to single-vehicle accidents?

  16. How do accident rates fluctuate month-by-month? Is there a predictable seasonal pattern?

  17. Is the problem getting better or worse in rural areas compared to urban ones?

  18. Do locations with more accidents also tend to have more casualties on average?

  19. For accidents involving a bus, what is the relationship with the number of other vehicles?

  20. What are the most common weather conditions during accidents on roundabouts?

  21. Which light conditions are most often associated with fatal accidents?

  22. Which districts have the highest ratio of serious accidents to slight accidents?

  23. What are the top 10 districts with the poorest reporting of road surface conditions?

  24. How many vehicles are usually involved in accidents on dual carriageways?

  25. What is the distribution of the number of casualties in accidents that occur in darkness?

  26. Where should we focus fatal accident prevention programs?

  27. Do most accidents occur during daylight or nighttime hours in high-risk districts?

  28. What are the most common weather conditions during accidents in Kensington and Chelsea District?

  29. Where are the districts with highest accident rates during December holidays?

  30. Which districts experience the most accidents where taxis or private hire vehicles are involved?

  31. Which districts have highest accident rates on dry road surfaces?


Question 1

To optimize traffic police patrols, on which day of the week do most accidents occur?

In [14]:
# Count accidents per day of week (0=Mon, 6=Sun)
counts = accident['DayOfWeek'].value_counts().reindex(np.arange(7), fill_value=0)

# Label the days
counts.index = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']

print(counts)
Mon     72662
Tue     94537
Wed     99541
Thu     99502
Fri     97994
Sat    107162
Sun     89285
Name: count, dtype: int64

INSIGHTS: Based on the analysis of accident frequency by day of the week, Saturday is the day with the highest volume of accidents, totaling 107,162 incidents . Therefore, to optimize traffic police patrols for maximum visibility and rapid response, the strategic focus should be prioritized on Saturdays.


Question 2

Has there been a year-over-year trend in accident rates? Are our safety measures working?

In [15]:
accidents_by_year = accident['Year'].value_counts().sort_index()

print(accidents_by_year)
Year
2019    182115
2020    170591
2021    163554
2022    144423
Name: count, dtype: int64

INSIGHTS: Yes, our safety measures are working. Every year, the number of accidents has gone down. We started with 182,115 accidents in 2019 and ended with 144,423 in 2022. This steady drop shows our efforts to improve safety are making a real difference .


Question 3

What percentage of accidents causes at least one serious or fatal injury?

In [16]:
percentage = ((accident['Accident_Severity'] == 'Fatal') | (accident['Accident_Severity'] == 'Serious')).mean() * 100

print(f"{percentage:.2f}% of accidents involved a fatal or serious injury.")
14.68% of accidents involved a fatal or serious injury.

INSIGHTS: About 14.68% of accidents cause a serious or fatal injury. This means that for every 100 accidents, almost 15 have severe outcomes. Most accidents are minor, but this number shows that dangerous crashes still happen often. This is important to know so we can focus on making roads safer.


Question 4

On average, how many people get hurt in each accident, and does this number change when more vehicles are involved?

In [17]:
overall_avg = round(accident['Number_of_Casualties'].mean(), 2)
avg_by_vehicles = accident.groupby('Number_of_Vehicles')['Number_of_Casualties'].mean().round(2)

overall_avg, avg_by_vehicles
Out[17]:
(np.float64(1.36),
 Number_of_Vehicles
 1      1.17
 2      1.37
 3      1.71
 4      2.00
 5      2.32
 6      2.61
 7      3.06
 8      3.40
 9      3.35
 10     3.63
 11     4.00
 12     2.29
 13     7.83
 14     5.44
 15     5.00
 16     6.00
 19    13.00
 28    16.00
 32     5.00
 Name: Number_of_Casualties, dtype: float64)

INSIGHTS: On average, 1.36 people get hurt in each accident. This number does change when more vehicles are involved. For example, accidents with just one vehicle have about 1.17 casualties, while those with five vehicles have about 2.32. More vehicles usually lead to more people hurt. This makes sense because more cars means more people are at risk in a single crash.


Question 5

Is there a specific combination of light and weather conditions that is particularly dangerous?

In [18]:
# Count accidents for each combination of light and weather conditions
danger = (accident.groupby(['Light_Conditions', 'Weather_Conditions']).size().sort_values(ascending=False).head(10))
danger
Out[18]:
Light_Conditions        Weather_Conditions   
Daylight                Fine no high winds       398654
Darkness - lights lit   Fine no high winds        92045
Daylight                Raining no high winds     49738
Darkness - no lighting  Fine no high winds        24859
Darkness - lights lit   Raining no high winds     22664
Daylight                Other                     10096
                        unaccounted               10042
Darkness - no lighting  Raining no high winds      6205
Daylight                Fine + high winds          5790
                        Raining + high winds       4938
dtype: int64

INSIGHTS: Yes, there is a particularly dangerous combination. The most accidents, almost 400,000, happened in daylight during fine weather with no high winds. The second most frequent, about 92,000 accidents, also happened in fine weather but during darkness when lights were on. This shows that many accidents happen in good weather, both in the daytime and at night. It means drivers may feel too confident when the weather is clear, which leads to more accidents.


Question 6

In each region, do more accidents happen in urban areas or rural areas?

In [19]:
# Top 5 districts by number of accidents
top_5_districts = accident['District Area'].value_counts().head(5).index

# Urban vs Rural counts for top 5 districts
accident[accident['District Area'].isin(top_5_districts)].groupby(['District Area', 'Urban_or_Rural_Area']).size()
Out[19]:
District Area  Urban_or_Rural_Area
Birmingham     Rural                    134
               Unallocated                0
               Urban                  13357
Bradford       Rural                    796
               Unallocated                0
               Urban                   5416
Leeds          Rural                   1774
               Unallocated                0
               Urban                   7124
Manchester     Rural                    143
               Unallocated                0
               Urban                   6577
Sheffield      Rural                    462
               Unallocated                0
               Urban                   5248
dtype: int64

INSIGHTS: In every one of the top five districts, many more accidents happen in urban areas than in rural areas. For instance, in Birmingham, there were 13,357 urban accidents but only 134 rural ones. This difference is the same for all the other top districts like Leeds, Manchester, and Sheffield, where urban accident numbers are always much higher. Therefore, for these areas, the answer is that more accidents happen in urban places.


Question 7

During rainy weather, which type of vehicle is most often involved in accidents?

In [20]:
# Top 5 vehicle types in rain accidents
top_5_vehicles = accident.loc[accident['Weather_Conditions'].str.lower().str.contains('rain'), 'Vehicle_Type'].value_counts().head(5)
top_5_vehicles
Out[20]:
Vehicle_Type
Car                                     67134
Van / Goods 3.5 tonnes mgw or under      4711
Bus or coach (17 or more pass seats)     3571
Motorcycle over 500cc                    3488
Goods 7.5 tonnes mgw and over            2387
Name: count, dtype: int64

INSIGHTS: When the weather is rainy, cars are the vehicle type most often involved in accidents. Cars were involved in 67,134 accidents that occurred in the rain. The next highest vehicle type was vans, which were involved in 4,711 accidents. The most likely reason for this is that there are far more cars driving on the roads than any other kind of vehicle.


Question 8

For our annual review, for each year, what was the district with the highest number of accidents? Has the worst-performing district changed?

In [21]:
# Count accidents per district per year
counts = accident.groupby(['Year', 'District Area']).size().reset_index(name='Accidents')

# Get the district with the most accidents each year
worst_district_by_year = counts.loc[counts.groupby('Year')['Accidents'].idxmax()]

worst_district_by_year
Out[21]:
Year District Area Accidents
24 2019 Birmingham 3820
438 2020 Birmingham 3506
852 2021 Birmingham 3289
1268 2022 Birmingham 2876

INSIGHTS: Birmingham was the district with the highest number of accidents every year from 2019 to 2022. The number of accidents in Birmingham decreased each year, from 3,820 in 2019 to 2,876 in 2022. Even though the total number went down over time, Birmingham remained the worst-performing district. This means the district with the most accidents did not change during this period.


Question 9

When are motorcycle accidents most likely to happen during the week?

In [22]:
# Filter motorcycle accidents
motorcycle_accidents = accident[accident['Vehicle_Type'].str.lower().str.contains('motorcycle')]

# Count accidents by day of the week
accidents_by_day = motorcycle_accidents['DayOfWeek'].value_counts().sort_index()

# Label the days
accidents_by_day.index = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']

accidents_by_day
Out[22]:
Mon    6122
Tue    8102
Wed    8404
Thu    8597
Fri    8245
Sat    9131
Sun    7576
Name: count, dtype: int64

INSIGHTS: Motorcycle accidents are most likely to happen on a Saturday, with 9,131 incidents. The number of accidents is also high throughout the week, from Tuesday to Friday. Sunday has the lowest number of motorcycle accidents for the week. This may tell us that riding is riskier on weekends and weekdays compared to Sundays.


Question 10

Which month of the year has the highest frequency of accidents?

In [23]:
accident['Month'] = accident['Accident Date'].dt.month

# Count accidents by month
accidents_by_month = accident['Month'].value_counts().sort_index()
accidents_by_month
Out[23]:
Month
1     52852
2     49482
3     54084
4     51741
5     56348
6     56474
7     57439
8     53909
9     56450
10    59673
11    60409
12    51822
Name: count, dtype: int64

INSIGHTS: November has the highest number of accidents for the year, with 60,409 incidents. October is the next highest month with 59,673 accidents. The months with the fewest accidents are February and April. The number of accidents generally increases from the start of the year towards the end. The peak in late autumn might be because of darker evenings and more rain.


Question 11

Are there geographic clusters of severe accidents, and is accident severity related to latitude and longitude?

In [24]:
# Check if location is correlated with the number of people hurt.
lat = accident['Latitude'].corr(accident['Number_of_Casualties'])
lon = accident['Longitude'].corr(accident['Number_of_Casualties'])
print(f"Correlation Latitude vs. Casualties: {lat:.3f}")
print(f"Correlation Longitude vs. Casualties: {lon:.3f}")
Correlation Latitude vs. Casualties: 0.032
Correlation Longitude vs. Casualties: -0.040

INSIGHTS: The correlation values between location and accident severity are very weak. Latitude has a small positive correlation of 0.032 with the number of casualties. Longitude has a small negative correlation of -0.040. This means there is no strong relationship between geographic coordinates and how severe an accident is.


Question 12

Do accidents on weekends correlate with higher severity than accidents on weekdays?

In [25]:
accident['Is_Weekend'] = (accident['DayOfWeek'] >= 5).astype(int) # Assuming Mon=0, Sun=6
correlation = accident['Is_Weekend'].corr(accident['Number_of_Casualties'])
print(f"Correlation between Weekend and Number of Casualties: {correlation:.3f}")
Correlation between Weekend and Number of Casualties: 0.021

INSIGHTS: No, accidents on weekends are not more severe than accidents on weekdays. It shows almost no connection between the day of the week and how many people get hurt. The number of casualties in an accident is nearly the same whether it happens on a weekday or a Saturday or Sunday.


Question 13

Which road type has the strongest correlation with higher casualty counts?

In [26]:
road_type_severity = accident.groupby('Road_Type')['Number_of_Casualties'].mean()
print(road_type_severity.sort_values(ascending=False).head(3))
Road_Type
Dual carriageway      1.477298
Slip road             1.423661
Single carriageway    1.344578
Name: Number_of_Casualties, dtype: float64

INSIGHTS: Dual carriageways have the strongest link to higher casualty counts, with an average of about 1.48 casualties per accident. Slip roads are next with 1.42 casualties, followed by single carriageways with 1.34. This means accidents on faster, multi-lane roads like dual carriageways tend to result in more people getting hurt. The road type is related to the severity of the accidents that happen on it.


Question 14

Do accidents involving cars, motorcycles, and HGVs differ in the number of other vehicles involved?

In [27]:
car = accident[accident['Vehicle_Type'] == 'Car']['Number_of_Vehicles']
bike = accident[accident['Vehicle_Type'].str.contains('Motorcycle')]['Number_of_Vehicles']
hgv = accident[accident['Vehicle_Type'].str.contains('Goods')]['Number_of_Vehicles']
f_oneway(car, bike, hgv)
Out[27]:
F_onewayResult(statistic=np.float64(1.4670124971711498), pvalue=np.float64(0.23061422663996944))

INSIGHTS: No, there is no significant difference in the number of vehicles involved in accidents between cars, motorcycles, and large trucks. The high p-value of 0.23 confirms that any small differences are not meaningful and are likely due to random chance. The type of vehicle does not affect how many other vehicles are in the accident.


Question 15

Do multiple vehicles accidents result in more casualties compared to single-vehicle accidents?

In [28]:
single = accident[accident['Number_of_Vehicles'] == 1]['Number_of_Casualties']
multiple = accident[accident['Number_of_Vehicles'] > 1]['Number_of_Casualties']
f_oneway(single, multiple)
Out[28]:
F_onewayResult(statistic=np.float64(15030.438521505124), pvalue=np.float64(0.0))

INSIGHTS: Yes, accidents involving multiple vehicles result in more casualties compared to single-vehicle accidents. The statistical test shows a p-value of 0.0, which means this difference is really important and not due to chance. When more than one vehicle is involved in a collision, the average number of people injured is higher. This is likely because these accidents involve more people overall.


Question 16

How do accident rates fluctuate month-by-month? Is there a predictable seasonal pattern?

In [29]:
accidents_by_month = accident['Month'].value_counts().sort_index()

plt.plot(accidents_by_month.index, accidents_by_month.values, marker='o')
plt.title('Seasonality of Accidents (Monthly Trend)')
plt.xlabel('Month')
plt.ylabel('Number of Accidents')
plt.xticks(range(1,13))
plt.grid(True)
plt.show()
No description has been provided for this image

INSIGHTS: Yes, accident rates show a seasonal pattern. They are highest in the winter, around December and January, and lowest in the summer. This is likely because winter weather makes roads more dangerous. The trend is very predictable, repeating every year.


Question 17

Is the problem getting better or worse in rural areas compared to urban ones?

In [30]:
rural_trend = accident[accident['Urban_or_Rural_Area'] == 'Rural'].groupby('Year').size()
urban_trend = accident[accident['Urban_or_Rural_Area'] == 'Urban'].groupby('Year').size()

plt.plot(urban_trend.index, urban_trend.values, marker='o', label='Urban')
plt.plot(rural_trend.index, rural_trend.values, marker='s', label='Rural')
plt.title('Trend of Accidents: Urban vs. Rural')
plt.xlabel('Year')
plt.ylabel('Number of Accidents')
plt.legend()
plt.grid(True)
plt.show()
No description has been provided for this image

INSIGHTS: Yes, the problem is getting worse in rural areas. Urban accidents are going down, but rural accidents are going up. This means the gap between rural and urban safety is growing.


Question 18

Do locations with more accidents also tend to have more casualties on average?

In [31]:
# Group by rounded lat and long
cluster_stats = accident.groupby([accident['Latitude'].round(2), accident['Longitude'].round(2)])

# Count and mean in one step
counts = cluster_stats['Index'].count()
casualties = cluster_stats['Number_of_Casualties'].mean()

# Scatter plot
plt.scatter(counts, casualties, alpha=0.5)
plt.title('Accidents vs Average Casualties by Location')
plt.xlabel('Accidents')
plt.ylabel('Average Casualties')
plt.show()
No description has been provided for this image

INSIGHTS: No. The number of accidents at a location does not affect how severe they are. A place with many crashes has the same average casualties as a place with few crashes.


Question 19

For accidents involving a bus, what is the relationship with the number of other vehicles?

In [32]:
bus_accidents = accident[accident['Vehicle_Type'].str.contains('Bus', na=False)]

plt.scatter(bus_accidents['Number_of_Vehicles'], bus_accidents['Number_of_Casualties'], alpha=0.3, s=20, color='purple')
plt.title('Bus Accidents: Vehicles vs Casualties')
plt.xlabel('Vehicles Involved')
plt.ylabel('Casualties')
plt.show()
No description has been provided for this image

INSIGHTS: For bus accidents, more vehicles usually mean more casualties. The scatter plot shows a clear upward trend. When a bus is involved in a crash with more vehicles, the number of casualties tends to be higher. This indicates that multi-vehicle collisions are generally more severe.


Question 20

What are the most common weather conditions during accidents on roundabouts?

In [33]:
roundabout_weather = accident[accident['Road_Type'] == 'Roundabout']['Weather_Conditions'].value_counts().head(8)

plt.bar(roundabout_weather.index, roundabout_weather.values, color='lightblue')
plt.title('Weather in Roundabout Accidents')
plt.xlabel('Weather')
plt.ylabel('Accidents')
plt.xticks(rotation=90)
plt.show()
No description has been provided for this image

INSIGHTS: Most accidents on roundabouts occur in fine weather with no strong winds. The second most common condition is rainy weather without high winds. This tells us that driver error in normal conditions may be a bigger factor than bad weather.


Question 21

Which light conditions are most often associated with fatal accidents?

In [34]:
fatal_light = accident[accident['Accident_Severity'] == 'Fatal']['Light_Conditions'].value_counts().head(6)

plt.bar(fatal_light.index, fatal_light.values, color='black')
plt.title('Fatal Accidents by Light Condition')
plt.xlabel('Light Condition')
plt.ylabel('Fatal Accidents')
plt.xticks(rotation=90)
plt.show()
No description has been provided for this image

INSIGHTS: Most fatal accidents happen in daylight. Darkness with street lights is also common, which means fatal crashes are not just caused by poor visibility.


Question 22

Which districts have the highest ratio of serious accidents to slight accidents?

In [35]:
counts = accident.groupby(['District Area', 'Accident_Severity']).size().unstack(fill_value=0)
severity_ratio = ((counts['Serious'] + counts['Fatal']) / counts['Slight']).nlargest(15)

plt.barh(severity_ratio.index, severity_ratio.values, color='darkred')
plt.title('Top Districts: Serious+Fatal vs Slight Accidents')
plt.xlabel('Ratio')
plt.show()
No description has been provided for this image

INSIGHTS: Selly, Maldon, and Purbeck have the highest ratio of serious to slight accidents. This means crashes there are more likely to be severe. These districts have more dangerous roads.


Question 23

What are the top 10 districts with the poorest reporting of road surface conditions?

In [36]:
outlier_districts = (
    accident[accident['Road_Surface_Conditions'] == 'unaccounted']
    .groupby('District Area')
    .size()
    / accident.groupby('District Area').size() * 100
)

top10 = outlier_districts.nlargest(10)

plt.barh(top10.index, top10.values, color="grey")
plt.title('Top 10 Districts with Most "Unaccounted" Road Surface Data')
plt.xlabel('Percentage (%)')
plt.show()
No description has been provided for this image

INSIGHTS: Lincoln has the worst reporting for road surface data. North Kesteven and Gedling are also poor. These areas often list road conditions as "unaccounted." This missing data makes safety analysis difficult there.


Question 24

How many vehicles are usually involved in accidents on dual carriageways?

In [37]:
dual_accidents = accident[accident['Road_Type'] == 'Dual carriageway']

plt.hist(dual_accidents['Number_of_Vehicles'], bins=7, color='red')
plt.title('Vehicles in Dual Carriageway Accidents')
plt.xlabel('Vehicles Involved')
plt.ylabel('Count')
plt.grid(True, alpha=0.3)
plt.show()
No description has been provided for this image

INSIGHTS: Most accidents on dual carriageways involve 2 vehicles. This is by far the most common number. The next most common are 1-vehicle and then 3-vehicle accidents.


Question 25

What is the distribution of the number of casualties in accidents that occur in darkness?

In [38]:
darkness_accidents = accident[accident['Light_Conditions'].str.startswith('Darkness', na=False)]
plt.figure(figsize=(10, 6))
plt.hist(darkness_accidents['Number_of_Casualties'], bins=20, color='midnightblue', edgecolor='white')
plt.title('Distribution of Casualties in Nighttime Accidents')
plt.xlabel('Number of Casualties')
plt.ylabel('Frequency')
plt.grid(True, alpha=0.3)
plt.show()
No description has been provided for this image

INSIGHTS: Most accidents in darkness have only 1 casualty. The vast majority involve very few people. Severe crashes with many casualties at night are rare.


Question 26

Where should we focus fatal accident prevention programs?

In [39]:
fatal_by_district = accident[accident['Accident_Severity'] == 'Fatal']['District Area'].value_counts().head(10)
plt.pie(fatal_by_district.values, labels=fatal_by_district.index, autopct='%1.1f%%')
plt.title('Top 10 Districts with Fatal Accidents')
plt.show()
No description has been provided for this image

INSIGHTS: Birmingham is the most important place to start because 22.7% of all fatal accidents happen there, which is much higher than any other district. The next most critical areas are Leeds (12.6%), Highland (11.2%), and East Riding of Yorkshire (10.2%). We should focus on these four districts to address more than half of all fatal accidents.

The districts at the bottom of the list, like Powys (6.7%) and Bradford (7.1%) have fewer fatal accidents. This means prevention programs will have a much smaller impact there.


Question 27

Do most accidents occur during daylight or nighttime hours in high-risk districts?

In [40]:
top3 = accident['District Area'].value_counts().head(3).index

# Plot
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
[accident[accident['District Area'] == d]['Light_Conditions'].value_counts().head(6).plot.pie(autopct='%1.1f%%', ax=axes[i], title=f'{d} District') for i, d in enumerate(top3)]
plt.suptitle('Light Conditions by District')
plt.tight_layout()
plt.show()

# Print
print("Light Conditions by District:\n" + "="*40)
[print(f"\n{d}:\n{accident[accident['District Area'] == d]['Light_Conditions'].value_counts().head(6)}") for d in top3]
No description has been provided for this image
Light Conditions by District:
========================================

Birmingham:
Light_Conditions
Daylight                       9667
Darkness - lights lit          3672
Darkness - lighting unknown      82
Darkness - lights unlit          51
Darkness - no lighting           19
Name: count, dtype: int64

Leeds:
Light_Conditions
Daylight                       6482
Darkness - lights lit          1992
Darkness - lighting unknown     251
Darkness - no lighting          162
Darkness - lights unlit          11
Name: count, dtype: int64

Manchester:
Light_Conditions
Daylight                       4610
Darkness - lights lit          1878
Darkness - lighting unknown     202
Darkness - no lighting           18
Darkness - lights unlit          12
Name: count, dtype: int64
Out[40]:
[None, None, None]

INSIGHTS:The three highest-risk districts are Birmingham, Leeds, and Manchester. In Birmingham, 9,667 accidents in daylight compared to 3,824 in darkness. Similarly, in Leeds, there were 6,482 daytime accidents versus 2,416 at night, and in Manchester, 4,610 occurred in daylight compared to 2,110 in darkness.

Accident prevention programs should focus mainly on daytime conditions, since most incidents occur during the day, though nighttime safety remains important.


Question 28

What are the most common weather conditions during accidents in Kensington and Chelsea District?

In [41]:
district_name = accident['District Area'].unique()[0]
district_data = accident[accident['District Area'] == district_name]
plt.figure(figsize=(10, 10)) 
weather_impact = district_data['Weather_Conditions'].value_counts().head(8)
plt.pie(weather_impact.values, labels=weather_impact.index, autopct='%1.1f%%')
plt.title(f'Weather Conditions for Accidents in {district_name} District')
plt.show()

print(f"Weather Conditions for Accidents in {district_name} District")
total_accidents = weather_impact.sum()
for condition, count in weather_impact.items():
    percentage = (count / total_accidents) * 100
    print(f"{condition}: {percentage:.1f}%")
No description has been provided for this image
Weather Conditions for Accidents in Kensington and Chelsea District
Fine no high winds: 84.7%
Raining no high winds: 12.3%
Other: 1.8%
Fine + high winds: 0.5%
Snowing no high winds: 0.4%
Raining + high winds: 0.3%
unaccounted: 0.1%
Fog or mist: 0.0%

INSIGHTS: Kensington and Chelsea District are concentrated under a single weather condition. Fine weather with no high winds, which accounts for 84.7% of all accidents. The next most common condition, Raining with no high winds, is far less frequent at 12.3%. All other weather conditions, including high winds, fog, or snow are less than 3% of incidents.


Question 29

Where are the districts with highest accident rates during December holidays?

In [42]:
holiday_data = accident[accident['Month'] == 12]

# Print district information
december_districts = holiday_data['District Area'].value_counts()
print("Accidents per district in December holidays:")
print(december_districts)

m8 = folium.Map(location=[55.3781, -3.4360], zoom_start=6)
HeatMap(holiday_data[['Latitude', 'Longitude']].dropna(), radius=12).add_to(m8)
m8 
Accidents per district in December holidays:
District Area
Birmingham            1097
Leeds                  702
Manchester             513
Bradford               505
Sheffield              484
                      ... 
Western Isles           10
Berwick-upon-Tweed      10
Teesdale                 9
Orkney Islands           8
Clackmannanshire         5
Name: count, Length: 422, dtype: int64
Out[42]:
Make this Notebook Trusted to load map: File -> Trust Notebook

INSIGHTS: Birmingham is the main hotspot for December holiday accidents with 1,097 cases and needs urgent safety action. Leeds (702 accidents) and Manchester (513 accidents) are next in priority. These three areas should get focused December safety measures like more traffic patrols, road safety campaigns, and public reminders since they have the most holiday accidents.


Question 30

Which districts experience the most accidents where taxis or private hire vehicles are involved?

In [43]:
taxi_accidents = accident[accident['Vehicle_Type'].str.contains('Taxi|Private hire', na=False)]
taxi_districts = taxi_accidents['District Area'].value_counts()
print("Taxi accidents per district:")
print(taxi_districts)

m14 = folium.Map(location=[55.3781, -3.4360], zoom_start=6)
HeatMap(taxi_accidents[['Latitude', 'Longitude']].dropna(), radius=15).add_to(m14)
m14
Taxi accidents per district:
District Area
Birmingham                   504
Westminster                  219
Glasgow City                 142
Leeds                        135
Kensington and Chelsea       132
                            ... 
North Shropshire               2
Blaeu Gwent                    2
Clackmannanshire               2
Chester-le-Street              1
London Airport (Heathrow)      1
Name: count, Length: 421, dtype: int64
Out[43]:
Make this Notebook Trusted to load map: File -> Trust Notebook

INSIGHTS: Birmingham has the most taxi-related accidents with 504 incidents. Westminster is a distant second with 219 accidents, followed by Glasgow City (142), Leeds (135), and Kensington and Chelsea (132). Taxi and private hire vehicles should be prioritized in Birmingham first, then the focus can expand to other high-incident districts.


Question 31

Which districts have lowest accident rates on dry road surfaces?

In [44]:
dry_roads = accident[accident['Road_Surface_Conditions'] == 'Dry']
dry_districts = dry_roads['District Area'].value_counts()
print("Dry roads accidents by district:")
print(dry_districts)

m19 = folium.Map(location=[55.3781, -3.4360], zoom_start=6)
HeatMap(dry_roads[['Latitude', 'Longitude']].dropna(), radius=12).add_to(m19)
m19
Dry roads accidents by district:
District Area
Birmingham          9367
Leeds               6397
Westminster         4596
Manchester          4493
Bradford            4047
                    ... 
Teesdale              85
Western Isles         78
Clackmannanshire      56
Shetland Islands      49
Orkney Islands        49
Name: count, Length: 422, dtype: int64
Out[44]:
Make this Notebook Trusted to load map: File -> Trust Notebook

INSIGHTS: The districts with the lowest dry-road accident rates are Orkney Islands and Shetland Islands, each with 49 incidents, followed by Clackmannanshire (56), Western Isles (78), and Teesdale (85).

Birmingham has the highest rate with 9,357 accidents, which requires greater priority.


Data Analysis by: Jomel Tomeo

Institution/Organization: Dalubhasaan ng Lungsod ng Lucena

Date: September 04, 2025

© 2025 All rights reserved. This analysis is prepared exclusively for academic and professional purposes. No part of this work may be reproduced, distributed, or transmitted without prior permission.